Visual object category recognition
نویسنده
چکیده
We investigate two generative probabilistic models for category-level object recognition. Both schemes are designed to learn categories with a minimum of supervision, requiring only a set of images known to contain the target category from a similar viewpoint. In both methods, learning is translation and scale-invariant; does not require alignment or correspondence between the training images, and is robust to clutter and occlusion. The schemes are also robust to heavy contamination of the training set with unrelated images, enabling them to learn directly from the output of Internet Image Search engines. In the first approach, category models are probabilistic constellations of parts, and their parameters are estimated by maximizing the likelihood of the training data. The appearance of the parts, as well as their mutual position, relative scale and probability of detection are explicitly represented. Recognition takes place in two stages. First, a feature-finder identifies promising locations for the model’s parts. Second, the category model is used to compare the likelihood that the observed features are generated by the category model, or are generated by background clutter. The second approach is a visual adaptation of “bag of words” models used to extract topics from a text corpus. We extend the approach to incorporate spatial information in a scale and translation-invariant manner. The model represents each image as a joint histogram of visual word occurrences and their locations, relative to a latent reference frame of the object(s). The model is entirely discrete, making no assumptions of uni-modality or the like. The parameters of the multi-component model are estimated in a maximum likelihood fashion over the training data. In recognition, the relative weighting of the different model components is computed along with the model reference frame with the highest likelihood, enabling the localization of object instances. The flexible nature of both models is shown by experiments on 28 datasets containing 12 diverse object categories, including geometrically constrained categories (e.g. faces, cars) and flexible objects (such as animals). The different datasets give a thorough evaluation of both methods in classification, categorization, localization and learning from contaminated data. This thesis is submitted to the Department of Engineering Science, University of Oxford, in fulfilment of the requirements for the degree of Doctor of Philosophy. This thesis is entirely my own work, and except where otherwise stated, describes my own research. Robert Fergus, New College Copyright c ©2005 Robert Fergus All Rights Reserved
منابع مشابه
The importance of visual features in generic vs. specialized object recognition: a computational study
It is debated whether the representation of objects in inferior temporal (IT) cortex is distributed over activities of many neurons or there are restricted islands of neurons responsive to a specific set of objects. There are lines of evidence demonstrating that fusiform face area (FFA-in human) processes information related to specialized object recognition (here we say within category object ...
متن کاملDiscriminative Object Categorization with External Semantic Knowledge
Visual object category recognition is one of the most challenging problems in computer vision. While effortless for humans, it is inherently difficult for machines because of the visual variations such as lighting, pose, clutter and occlusion. Even assuming that we can obtain perfect instance-level visual representations, the object category recognition problem still remains difficult for machi...
متن کاملUnknown Object Identification Using Category Visual Words with Rejection Function
In this paper, we introduce an identification method for unknown category objects. Most popular conventional methods in object recognition use Bag of Features (BoF) that represents the image as an appearance frequency histogram of common visual words by quantizing SIFT features. However, this method is unable to identify unknown objects because the common visual words cannot represent the unkno...
متن کاملScene and Object Recognition with Supervised Nonlinear Neighborhood Embedding
Image category recognition is important to access visual information on the level of objects and scene types. In this paper, we develop a Supervised Nonlinear Neighborhood Embedding (SNNE) subspace algorithm of different visual features for object and scene recognition, which learns an adaptive nonlinear subspace by preserving the neighborhood structure of the visual feature space. In the propo...
متن کاملA Neural Network Model of Visual Object Recognition Impairments after Brain Damage
Dysfunction of the visual object recognition system in humans is briefly discussed and a basic connectionist model of visual object recognition is introduced. Experimentation in which two variants of this model are lesioned is undertaken. The results suggest that the well documented phenomenon of superordinate preservation is model independent. Differential category specific recognition deficit...
متن کاملVisual recognition: as soon as you know it is there, you know what it is.
What is the sequence of processing steps involved in visual object recognition? We varied the exposure duration of natural images and measured subjects' performance on three different tasks, each designed to tap a different candidate component process of object recognition. For each exposure duration, accuracy was lower and reaction time longer on a within-category identification task (e.g., di...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005